 /*
 
 This file merges in data from different sources to create data for analysis.
 

 Inputs
      data
	       a) DataFromCompany.dta is data from the company servers, recoded there, and then converted to stata format. 
			An observation (row) is an individual (identified by ID) by week (Week).	
				Variables include
				i)    ID is a unique identification for each individual.
				ii)   Week identifies week. Coded 1 for first week of experiment through week 24 and includes weeks -3, -2, -1, 0 
						before the experiment.
				iii)  manager identifies the individual's manager. Coded 1--55.
				iv)   brand identifies the individual's division. 
				v)    RPC is revenue per call. This variable is calculated by the company and used internally at the company for
						commissions.
				vi)   RPH is revenue per hour. This variable is calculated by the company.
				vii)  Revenue is revenue as defined by and calculated by the company.
				viii) Treatment identifies randomized group. Coded 1 for internal control 2 for pair incentives 3 for structured
						meetings 4 for combined and 5 for external control.
				ix)   Partner is ID of the randomized person that was assigned to an individual (missing for external control that 
						was not assigned partner).
				x)    age is age of the individual
				xi)   female is 1 if female 0 else
				xii)  tenure is how many weeks an individual has been with the firm
				xiii) commission pre-tax take home commissions
				xiv)  calls is total calls taken by an individual
				xv)   term is an indicator if individual ended employment (voluntary or otherwise)
				xvi)  adherence is a percent of time clocked in and able to take calls
				xvii) phonehours is number of hours on the phone or waiting for calls
		   b) locationData.dta is data on location of workers from the company
				Variables include
				i) ID
				ii) location
           c) Worksheet Data_updated.dta is data compiled from agent worksheet responses. An observation (row) is a unique
			individual (identified by ID).
		   		Variables include:
				i)    ID is a unique identification for each individual. 
				ii)   Knowledge_Received_Any is a binary variable that indicates where an agent ever received knowledge advice 
						from her/her parter (1) or not (0).       
				iii)  Support_Received_Any is a binary variable that indicates where an agent ever received support advice 
						from her/her parter (1) or not (0). 
				iv)   Other_Received_Any is a binary variable that indicates where an agent ever received other advice 
						from her/her parter (1) or not (0). 
				v)    Mixed_Received_Any is a binary variable that indicates where an agent ever received mixed 
						(knowledge and support) advice from her/her parter (1) or not (0). 
				vi)   Knowledge_alt_Received_Any is akin to Knowledge_Received_Any, but it incorporates the additional 
						manually classified worksheet responses.
				vii)  Support_alt_Received_Any is the akin to Support_Received_Any, but it incorporates the additional manually
				viii) Other_alt_Received_Any is the akin to Other_Received_Any, but it incorporates the additional manually
				ix)   Mixed_alt_Received_Any is the akin to Mixed_Received_Any, but it incorporates the additional, 
						manually classified worksheet responses.
		   d) Survey Response Data.dta is data from survey responses from agents before, during, and after the intervention. 
				An observation (row) is a unique individual (identified by ID).
				Variables include:
				i)    ID is a unique identification for each individual. 
				ii)   HowConnectedDoYouFeel is responses to the question "How connected do you feel to others at Company?"      
				iii)  HowManyWorkRelatedInteractions is responses to the question "During an average full week of work, how many 
						times do you actually reach out to others"
				iv)   HowBeneficialAreTheseInteract is responses to the question "On average, when you reach out to others at 
						Company about individual calls or selling, how beneficial" 
				v)    WhatDollarValueWouldYou is responses to the question "Please value the following 4 prizes: [Prize 1, Prize 2, 
						etc.]" 
				vi)   IWasAwareOfTheExperimentThat is responses to the question "Do you know about the [name] sales tournament that 
						just wrapped up at [firm name]?" 
				vii)  WeTurnedInACompletedWorksheet is responses to the question "Did you turn in a worksheet (may have been 
						gold colored) in [week number]?"
				viii) ISpentBlankMinutesWithMy is responses to the question "About how much time did you spend with your 
						[week number] partner talking about the worksheet?"
				ix)   TheseInteractionsWithMyPartner is responses to the question "How beneficial was the conversation around 
						the [week number] worksheet to you?"
		   e) Winners Data.dta is data from the pair incentives of weekly winners. An observation (row) is an individual 
			(identified by ID) by week (Week).
		        Variables include:
				i)   ID is a unique identification for each individual.
				ii)  Week identifies week. Coded 1 for the first week of experiment through week 4, the last week.
				iii) Winner equals 1 if the individual was part of a pair that won in the given week. 
		   f) Commission Data (Treatment).dta---data on commissions from the company
				Variables include:
				i) ID
				ii) commissions---supplements data in DataFromCompany
		   g) Calls and Hours.dta---data from the company on number of calls and hours per worker
				Variables include:
				i) ID
				ii) calls---supplements data in DataFromCompany
				iii) hours---supplements data in DataFromCompany
 	   
 Outputs
     data
	       a) DataforAnalysis---used all subsequent files

 */
 
 
 
 
clear
set more off

*set paths 
	cd  
	global datapath  
	global tablepath  
	global figurepath  
    global logpath

 
use "$datapath\DataFromCompany.dta", clear 

*------------------------------------------------------------------------------*
*                    Merge in supplemental data
*------------------------------------------------------------------------------*
 
**Add anonymized location data
	sort ID
	merge n:1 ID using  "$datapath\locationData.dta"
	drop if _merge == 2
	drop _merge
 
**Add in Worksheet Data
	sort ID
	merge n:1 ID using "$datapath\Worksheet Data_updated.dta"	
		drop if _merge == 2
		gen WorksheetMerge = _merge
		drop _merge
		
**Add in survey responses	
	sort ID
	merge n:1 ID using "$datapath\Survey Response Data.dta"
		drop if _merge == 2
		gen SurveyMerge = _merge
		drop _merge
		
**Add in winners	
	sort ID Week
	merge 1:1 ID Week using "$datapath\Winners Data.dta"
		drop if _merge == 2
		gen WinnersMerge = _merge
		drop _merge
		
**Add in commission 
	*this data adds in commission data we were initially missing from DataFromCompany
	merge 1:1 ID Week using	"$datapath\Commission Data (Treatment).dta"
	drop if _merge == 2
	replace commission = Commission if commission ==.
	drop Commission
	drop _merge
	
**Add in calls and hours	
	merge 1:1 ID Week using "$datapath\Calls and Hours.dta"
	drop if _merge ==2
	drop _merge
	 
*------------------------------------------------------------------------------*
*                          Define Variables 
*------------------------------------------------------------------------------*
*Log revenue per hour
	gen lrpc = log(RPC)
	
*Log revenue per hour
	gen lrph = log(RPH)

*Periods
	gen periodn3_0 	= (Week >= -3 & Week <= 0)
	gen period1_4 	= (Week >= 1 & Week <= 4)
	gen period5_11 	= (Week >= 5 & Week <= 11)
	gen period12_18 = (Week >= 12 & Week <= 18)
	gen period19_24 = (Week >= 19 & Week <= 24)
	gen period19	= (Week >= 19)
	gen period5_24	= (Week >= 5 & Week <= 24)
	
*Balanced Panel 
	bysort ID: egen term_ever = max(term)
	bysort ID: egen balancesample = max(period19)
	
*High performer
	*Calculate the average RPC for each individual before the experiment (weeks -7 to 0)
		bysort ID brand: egen temp_RPC_n70_temp = mean(RPC) if Week>=-7 & Week<=0 & Treatment !=5
		bysort ID brand: egen temp_RPC_n70 = mean(temp_RPC_n70_temp)
		drop temp_RPC_n70_temp
		replace temp_RPC_n70 = -1 if temp_RPC_n70==. & Treatment!=5 // if missing data -7 to 0 then designate as a low performer
		sort ID temp_RPC_n70
		by ID: gen use_n70 = _n==1 & Treatment !=5

		gen temp_st_n70 =.
		levelsof brand, local(Brandlevels)
		foreach i of local Brandlevels {
			sum temp_RPC_n70 if brand==`i' & use_n70==1, d
			replace temp_st_n70 = r(p50) if brand==`i'
		}
		
	*generate variable that identifies high performer 
		gen temp_highperformer_n70 = (temp_RPC_n70 > temp_st_n70) if  temp_RPC_n70<.
		bysort ID: egen highperformer = max(temp_highperformer_n70)

	*Check performance of individuals that cannot be classified.	
		gen temp_hp_w_missing = 2 if highperformer==.
		replace temp_hp_w_missing = highperformer if highperformer<.
		gen temp_period = 0 if Week<1
		replace temp_period = 1 if Week>=1 & Week<5
		
	*show diagnostics for high performers
		disp "Diagnostic check on high performers"
		table temp_hp_w_missing temp_period if Treatment!=5, c(freq mean lrpc mean tenure median tenure)
		drop temp_period temp_hp_w_missing
 
*generate variable that identifies individuals paired with high performers
	gen highperformerpartner  = .
	levelsof ID, local(IDlevels) 
	foreach i of local IDlevels {
		sum highperformer if ID == `i'
		replace highperformerpartner  = r(mean) if Partner == `i'
	}
	bysort ID: egen highperformerpartner_ever = max(highperformerpartner )
	
	*drop temporary variables
		drop temp*

	 rename highperformerpartner high_treat
	 gen highperformerpartner = .
	 replace highperformerpartner = highperformerpartner_ever if Week <1 | Week > 4
	 replace highperformerpartner = high_treat if Week >0 & Week < 5
	 
	 replace highperformer = 0 if highperformer ==.
	 replace highperformerpartner = 0 if highperformerpartner ==.
	 
	*generate interaction variable
		gen hp_post = highperformerpartner*(Week>4)

*generate variable that identifies rotate partners
	bysort ID: egen rotate_partner = sd(Partner)
	replace rotate_partner = (rotate_partner !=0)

*generate interaction variables
	foreach i in Knowledge_Received_Any Support_Received_Any Other_Received_Any Mixed_Received_Any Knowledge_alt_Received_Any Mixed_alt_Received_Any Knowledge_Y_Received_Any Mixed_Y_Received_Any	Support_Y_Received_Any Other_Y_Received_Any {
		gen `i'_Post = `i' * period5_24
	}
	 
	gen worksheetsample = ((Treatment == 3 | Treatment == 4) & (periodn3_0 == 1 | period5_24 == 1))
	
*Diff in Diff specifications
**Difference-in-differences indicator variables for treatment period
	gen internalcontrol 	= period1_4*(Treatment==1)
	gen pairincentive 		= period1_4*(Treatment==2)
	gen structuredmeeting 	= period1_4*(Treatment==3)
	gen combined 			= period1_4*(Treatment==4)
	
	label var internalcontrol 	"Internal Control" 
	label var pairincentive 	"Pair-Incentives" 
	label var structuredmeeting "Structured-Meetings" 
	label var combined  		"Combined" 
	
**Difference-in-differences indicator variables for post period	
	gen internalcontrolpost 	= period5_24*(Treatment==1)
	gen pairincentivepost 		= period5_24*(Treatment==2)
	gen structuredmeetingpost 	= period5_24*(Treatment==3)
	gen combinedpost			= period5_24*(Treatment==4)
	
	label var  internalcontrolpost 	"Internal Control" 
	label var pairincentivepost 	"Pair-Incentives" 
	label var structuredmeetingpost "Structured-Meetings" 
	label var combinedpost  		"Combined" 
	
**Triple differences indicator variables for post period	
	gen internalcontrol_high 	=  highperformerpartner*internalcontrol
	gen pairincentive_high 		=  highperformerpartner*pairincentive
	gen structuredmeeting_high 	=  highperformerpartner*structuredmeeting
	gen combined_high 			=  highperformerpartner*combined
	
	label var internalcontrol_high 		"Internal Control $\times$ High Performing Partner" 
	label var pairincentive_high 		"Pair-Incentives $\times$ High Performing Partner" 
	label var structuredmeeting_high 	"Structured-Meetings $\times$ High Performing Partner" 
	label var combined_high  			"Combined $\times$ High Performing Partner" 


	gen internalcontrol_post_high 	=  highperformerpartner*internalcontrolpost
	gen pairincentive_post_high 	=  highperformerpartner*pairincentivepost
	gen structuredmeeting_post_high =  highperformerpartner*structuredmeetingpost
	gen combined_post_high 			=  highperformerpartner*combinedpost
	
	label var  internalcontrol_post_high 	"Internal Control $\times$ High Performing Partner" 
	label var pairincentive_post_high 		"Pair-Incentives$\times $ High Performing Partner" 
	label var structuredmeeting_post_high 	"Structured-Meetings $\times$ High Performing Partner" 
	label var combined_post_high  			"Combined $\times$ High Performing Partner" 

**Collapse by manager
    bysort manager Week: gen uniqueManagerWeek = (_n ==1) // to only count one observation per manager Week
	bysort manager Week: egen lrpc_manager = mean(lrpc)
	bysort manager Week: egen revenue_manager = mean(Revenue)
 
**unique individual for term (attrition) analysis
****used in term column table 4
	bysort ID: gen uniqueID = 1 if _n ==1  
	gen t2 = (Treatment ==2)
	gen t3 = (Treatment ==3)
	gen t4 = (Treatment ==4) 

 save "$datapath\DataforAnalysis.dta", replace
